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Abstract 



A search is presented for a massive particle, generically referred to as a Z', decaying 
into a tt pair. The search focuses on Z' resonances that are sufficiently massive to 
produce highly Lorentz-boosted top quarks, which yield collimated decay products 
that are partially or fully merged into single jets. The analysis uses new methods to 
analyze jet substructure, providing suppression of the non-top multijet backgrounds. 
The analysis is based on a data sample of proton-proton collisions at a center-of-mass 
energy of 7TeV, corresponding to an integrated luminosity of 5fb _1 . Upper limits in 
the range of 1 pb are set on the product of the production cross section and branch- 
ing fraction for a topcolor Z' modeled for several widths, as well as for a Randall- 
Sundrum Kaluza-Klein gluon. In addition, the results constrain any enhancement in 
tt production beyond expectations of the standard model for tt invariant mass larger 
than 1 TeV/c 2 . 
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1 Introduction 

Among scenarios for physics beyond the standard model (SM) are possibilities of new gauge 
interactions with large couplings to third-generation quarks BTHTT f . These interactions predict 
new massive states, generically referred to as 21 bosons, that can decay into tt pairs. Typical 
examples are the topcolor 71 described in Refs. |H-|6|, an d the Randall-Sundrum Kaluza-Klein 
(KK) gluons of Ref. [12]. Other models ffT3TTl6| have recently been proposed to resolve the 
discrepancy in the forward-backward asymmetry in tt production reported at the Tevatron 
|T7l - l2"T| . Model-independent studies of the implications of a large forward-backward asym- 
metry suggest that a strong enhancement of the production cross section for tt pairs would be 
expected at the Large Hadron Collider (LHC) for invariant masses m tt > ITeV/c 2 , if the ob- 
served discrepancy with the predictions of the standard model (SM) is due to new physics at 
some large mass scale | | 22H231 . Searches for new physics in top-pair production have been per- 
formed at the Tevatron p4] - l26l , and provide the most stringent lower limits on the mass {mjj) 
of narrow-width (IV) resonances, e.g. excluding a topcolor tt resonance with Tjj /mji = 1.2% 
for masses below 0.8 TeV/c 2 . 

In this Letter, several models of resonant tt production are considered, including a 71 reso- 
nance with a narrow width of 1% of the mass, a Z' resonance with a moderate width of 10% 
of the mass, as well as broader KK gluon (g') states [12]. An enhancement over tt continuum 
production at large tt invariant masses is also considered. 

This study examines decays of produced tt pairs in the all-hadronic channel, taking advantage 
of the large (46%) branching fraction of tf — > W + bW~b — > 6 quarks, and focuses on final states 
with m t ( > 1 TeV/c 2 . For lower masses, the background from quantum-chromodynamic (QCD) 
production of non-top multijet events makes the search prohibitively difficult in this channel. 
At high m.fr using new techniques in jet reconstruction to identify jet substructure ||27f]3"T] , it is 
possible to study highly boosted top quarks (£/ m t c 2 > 2, where E and m t are the energy and 
mass of the top quark). The decay products of these highly boosted top quarks are collimated, 
and are partially or fully merged into single jets with several separate subjets corresponding 
to the final-state quarks (one from the bottom quark, and two light-flavor quarks from the 
W decay). The data sample corresponds to an integrated luminosity of 5 fb^ 1 collected by the 
Compact Muon Solenoid (CMS) experiment 11321 in proton-proton collisions at a center-of-mass 
energy of 7 TeV at the LHC. 

In the following Letter, Sec. [2] describes the CMS detector and the event reconstruction. Sec- 
tion [3] explains the strategy for the analysis and the derivation of the efficiency and misiden- 
tification probability of the substructure tools that were used. Section [4] gives the systematic 
uncertainties in the analysis. Section [5] describes the statistical methodology used. Section [6] 
presents a summary of the results. 

2 CMS detector, event samples, and preselection 

The CMS detector is a general-purpose detector that uses a silicon tracker, as well as finely seg- 
mented lead-tungstate crystal electromagnetic (ECAL) and brass/ scintillator hadronic (HCAL) 
calorimeters. These subdetectors have full azimuthal coverage and are contained within the 
bore of a superconducting solenoid that provides a 3.8 T axial magnetic field. The CMS detec- 
tor uses a polar coordinate system with the polar angle 9 defined relative to the direction (z) 
of the counterclockwise proton beam. The pseudorapidity rj is defined as rj = — lntan(0/2), 
which agrees with the rapidity y = \ In for objects of negligible mass, where E is the en- 
ergy and p z is the longitudinal momentum of the particle. Charged particles are reconstructed 
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2 CMS detector, event samples, and preselection 



in the tracker for \t]\ < 2.5. The surrounding ECAL and HCAL provide coverage for pho- 
ton, electron, and jet reconstruction for \rj\ < 3. The CMS detector also has extensive forward 
calorimetry that is not used in this analysis. Muons are measured in gas-ionization detectors 
embedded in the steel return yoke outside the solenoid. 

Events were selected with an online trigger system, with decisions based on the transverse 
momentum (px) of a single jet measured in the calorimeters. The instantaneous luminosity 
increased with time, hence two thresholds were used for different running periods. Most of 
the data were collected with a threshold of jet px > 300 GeV/c, and the rest with a threshold of 
240 GeV/c. Offline, one jet is required to satisfy pj > 350 GeV/c. 

There are several Monte Carlo (MC) simulated samples in the analysis. The continuum SM tt 
background is simulated with the MadEvent/MadGraph 4.4.12 |33| and pythia 6.4.22 ||34l 
event generators. The MadGraph generator is also used to model generic high-mass reso- 
nances decaying to SM top pairs. In particular, a model is implemented with a Z' that has 
SM-like fermion couplings and mass between 1 and 3 TeV/c 2 . However, in the MC generation 
of the Z', only decays to tf are simulated. The width of the resonance is set to 1% and 10% of mjj , 
so as to check the predictions for a narrow and a moderate resonance width, respectively. Here, 
the 10% width is comparable to the detector resolution. The pythia 8.145 event generator 1135 1 
is used to generate Randall-Sundrum KK gluons with masses mg = 1, 1.5, 2, and 3 TeV/c 2 , and 
widths of ~ 0.2m g /. These Randall-Sundrum gluons have branching fractions to tt pairs of 0.93, 
0.92, 0.90, and 0.87, respectively. PYTHIA6 is also used to generate non-top multijet events for 
background studies, cross-checks, and for calculating correction factors. The CTEQ6L [36] par- 
ton distribution functions (PDF) are used in the simulation. The detector response is simulated 
using the CMS detector simulation based on Geant4 |37 | . 

Events are reconstructed using the particle-flow algorithm [38], which identifies all recon- 
structed observable particles (muons, electrons, photons, charged and neutral hadrons) in an 
event by combining information from all subdetectors. Event selection begins with removal of 
beam background by requiring that events with at least 10 tracks have at least 25% of the tracks 
satisfying high-purity tracking requirements |39|. The events must have a well-reconstructed 
primary vertex, and only charged particles identified as being consistent with the highest Zp 2 
interaction vertex are considered, reducing the effect of multiple interactions per beam crossing 
(pile-up) by » 60%. 

The selected particles, after removal of charged hadrons from pile-up and isolated leptons, are 
clustered into jets using the Cambridge-Aachen (CA) algorithm with a distance parameter of 
R = 0.8 in tj-(p space, where cp is the azimuthal angle 11401 |41| , as implemented in the FAST- 
Jet software package version 2.4.2 [421I43] . The CA algorithm sequentially merges into single 
objects, by four-vector addition, the pairs of particle clusters that are closest in the distance 
measure djj = AR 2 /R 2 , where AR,y = \J (A?/) 2 + (A(p) 2 and R = 0.8, until the minimum is less 
than or equal to the so-called beam distance d,g, which equals unity in the CA algorithm. More 
generally these distance measures are equal to dy = min(px 2 ", px 2n )AR|-/R 2 and d;g = Pt 2 "- 
In the more common cases of the kj and anti-fcx algorithms [44J, n = 1 and n = — 1, respec- 
tively, however for the CA algorithm n = 0, and hence only angular information is used in 
the clustering. When the beam distance for particle cluster i is smaller than all of the other dn, 
particle cluster i is identified as a jet and the clustering proceeds for the remaining particles 
in the event. Jet energy scale corrections are applied as documented in Ref. [451. AH jets are 
required to satisfy jet-quality criteria [38], as well as \y\ < 2.5. The rapidity is used in this case 
because the jets acquire a finite mass as part of the imposed jet-quality criteria. 
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3 Analysis method 

The analysis is designed for cases in which the tt system has sufficient energy for the decay 
products of each top quark to be emitted into a single hemisphere, implying that E/m t c 2 > 
2. As a consequence, the top quarks can be either partially merged when only the W decay 
products are merged into a single jet, or fully merged when all top decay products are merged 
into a single jet. Thus, this analysis becomes inefficient for low masses, and upper limits are 
not evaluated for Z' masses below 1 TeV/c 2 . 

The largest background in this search is the non-top multijet (NTMJ) background. This is 
highly suppressed by requirements on the jet mass and substructure. The remaining NTMJ 
background is estimated by computing the probability for non-top jets to pass the top-jet se- 
lections (misidentification probability) in control regions of the data. These control regions 
are constructed by inverting substructure selections while keeping mass selections fixed. This 
mistagging probability is then applied to the signal region to estimate the contribution from 
the NTMJ background. 



3.1 



3.3 



the signal estimate is 
Finally, the results of 



In this section, the jet topologies in the analysis are defined in Sec. 
described in Sec. |3.2[ and the background estimate is shown in Sec. 
the event selection and the background estimate are presented in Sec. 3.4 

3.1 Analysis of jet topologies 

The events are classified into two categories, depending on the number of final-state jets that 
appear in each hemisphere. The 1+1 channel comprises dijet events in which each jet corre- 
sponds to a fully merged top-quark candidate, denoted as a Type-1 top-quark candidate. The 
1+2 channel comprises trijet events that fail the 1+1 criteria, with a Type-1 top-quark candidate 
in one hemisphere, and at least two jets in the other, one being a jet from a b quark (although no 
identification algorithms are applied) and the other a merged jet from a W. These two separate 
jets define a Type-2 top-quark candidate in the 1+2 channel. Further channels, such as 2+2, 
which would correspond to two Type-2 top quarks, are not considered in this analysis. The 
1+1 and 1+2 selections are now discussed in detail. 

The 1+1 events are required to have at least two Type-1 top-quark candidates, each recon- 
structed with px > 350 GeV/ c. Both candidates are tagged by a top-tagging algorithm |27l |28| 
to define merged top jets. In the case of more than two top-tagged jets, the two top-tagged jets 
with the highest pj are considered. The top-tagging algorithm is based on the decomposition 
of a jet into subjets, by reversing the final steps of the CA jet-clustering sequence. In this de- 
composition, particles that have small pj or are at large angles relative to the parent cluster are 
ignored. At least three subjets are required in each jet. While the subjets of generic jets tend to 
be close together, and one of them often dominates the jet energy because of gluon emission 
in the final state, the decay products of the top quark share the jet energy more equally and 
emerge at wider angles. The mass of the summed four-vector of the constituents of the hard 
jet must be consistent with the mass of a top quark m t 175 GeV/c 2 (140 < m; ei < 250 GeV/c 2 , 



where the values chosen are optimized through MC simulation). Figure la shows the expected 
jet mass for the Z' signal from MC as a dotted histogram, and the expected jet mass for the 
NTMJ background from MC as a solid yellow histogram. As expected, the Z' signal has a 
peak at the top mass corresponding to fully merged top jets, and has a shoulder at the W mass 
corresponding to partially merged top jets. The minimum pairwise invariant mass of the three 
subjets of highest pj is required to be > 50 GeV/ c 2 , because the combination with the minimum 
pairwise mass often ( > 60%) consists of the jet remnants from the W decay. 
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3 Analysis method 



The 1+2 events are required to have exactly one hemisphere containing a top-tagged Type-1 
candidate with pj > 350GeV/c. That is, only events that fail the 1+1 criteria are considered 
for the 1+2 selection. In the hemisphere opposite the top-tagged Type-1 candidate, there must 
be at least two jets, one identified as a W-jet candidate, with pj > 200GeV/c, and another jet 
from a b quark (although no identification algorithm is used) with pj > 30GeV/c. The W jet 
is required to be tagged by a W-tagging algorithm, based on the jet pruning technique II29U30I . 
The W-tagging algorithm requires two subjets, a total jet mass consistent with the mass of 
the W boson m w = 80.4GeV/c 2 (60 < m- ]fA < 100 GeV/c 2 ), and an acceptable "mass-drop" 
parameter p. of the final subjets relative to the hard jet [31 j. The mass-drop p is defined as 
the ratio of the mass of the more massive subjet m\ to the mass of the complete jet wjj et , and 
is required to be smaller than 40% (mi/mjet = p < 0.4). This selection helps to discriminate 
against generic jets, which usually have larger p values. The W-jet and b-jet candidates combine 
to form the Type-2 top-quark candidate, whose mass must be consistent with that of the top 
quark (140 < mj et < 250 GeV/c 2 , where the values chosen are optimized using MC simulation). 
When there are more than two jets in the Type-2 hemisphere, the b-quark candidate is taken as 
the one closest to the W-tagged jet in rj-(p space. 

The jet-pruning technique used to select W jets removes a portion of the jet, which is not ac- 
counted by the ordinary jet energy scale corrections. The jet corrections used in this analysis 
are derived from unpruned jets, and the impact on pruned jets is therefore investigated using 
a dijet MC sample. In particular, the pi of reconstructed pruned jets are compared to the pi 
of matched generator-level particle jets, that also underwent the pruning procedure, and the 
difference of 2-3% observed in absolute response suggests a systematic uncertainty in the jet 
energy from this source. An uncorrelated 3% uncertainty is therefore added to the uncertain- 
ties for standard jet energy corrections. This uncertainty is applied for both the top-tagging and 
jet-pruning algorithms, and is added in quadrature to the general jet energy scale corrections 
of Ref. |45l , which are ~ 2-4%, depending on pj and rapidity. 

3.2 Signal efficiency 

For both the Z' signal and the (subdominant) tt background, the efficiencies of the analysis 
selection are estimated from Monte Carlo simulation, with scale factors to account for measured 
differences with respect to the data. Figure [Tb| shows the efficiency for tagging a true top jet as 
a function of pj. Above ~ 500 GeV/c, the efficiency plateaus between 50-60%. 

Three scale factors are applied to the efficiency. The first scale factor is used to correct for the 
trigger in simulated signal events. Its value is equal to the trigger efficiency: it rises from 75% 
(60%) for 1+1 (1+2) events at = 1.0TeV/c 2 and becomes fully efficient for > 1.5TeV/c 2 . 
The value of the trigger efficiency is estimated per jet on a sample of simulated NTMJ events 
passing the top-quark candidate selections, it is then applied to simulated signal events. The 
systematic uncertainty is assigned to be 50% of the trigger inefficiency from MC. The difference 
between the measured trigger efficiency in data and MC is roughly in this range, but suffers 
from large statistical uncertainties. The second scale factor is used to correct for any differences 
in jet-energy scale for the subjets and for the full jets. This is referred to as the subjet jet-energy 
scale factor. The third scale factor is used to correct for the impact of any differences between 
data and MC in efficiencies for finding jets with substructure. This is referred to as the subjet 
selection-efficiency scale factor. The derivation of the second and third scale factors is now 
discussed in detail. 

These second two scale factors are both determined in a control sample comprising events with 
a single muon (referred to as the muon control sample), usually from the decay of t — > Wb, 
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CMS Simulation, is = 7 TeV 
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Figure 1: (a) The simulated jet mass for NTMJ MC (light yellow histogram) and Z' MC (open 
histogram), (b) The type-1 per-jet top-tagging efficiency for Z' MC events is shown as red 
squares with error bars (see Sec. 3.2 1, and the type-1 per-jet mistag probability for top tagging 
measured in data is shown as black circles with error bars (see Sec. |3.3) , both as a function of 
jet p T . 
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3 Analysis method 



with W — > jiv, and at least two jets. The event selection is nearly identical to that in Ref. 
except for larger pj requirements for the jets. The leading jet is required to satisfy pj > 200 
GeV/ c, and the sub-leading jet is required to satisfy pj > 30 GeV/ c. Due to a small number of 
fully-merged (Type-1) top jets in this sample, it is not possible to construct a sufficiently large 
number of true Type-1 top jets. Instead, the characterization of jets with substructure is studied 
with moderately-boosted tt events where there is a large fraction of partially-merged (Type-2) 
top-quark candidates and the W-jets within them. 

The events in the muon control sample used to study W-tagging are dominated by tt decays, 
and the leading-jet pj requirement (> 200 GeV/c) favors the topology in which the two top 
quarks are produced back-to-back, thereby facilitating jet merging on the side of the "hadronic" 
top quark, containing jets from t — > Wb — > qq'b. In the other hemisphere, these events contain 
one isolated muon, consistent with originating from the primary collision vertex, with pj > 
45GeV/c and \t]\ < 2.1. Events are rejected if they contain other isolated electrons or muons 
within \tj | < 2.1 with pj > 15GeV/c or pj > lOGeV/c, respectively. The remaining events are 
then required to have > 2 jets with pj > 30GeV/c, and at least one jet with pj > 200GeV/c. 
Unlike the all-hadronic channel discussed in this Letter, in the muon control sample, there are 
two well-separated jets originating from b quarks, so in order to enhance the tt fraction of this 
control region, the events are required to contain at least one jet tagged with a secondary-vertex 
b-tagging algorithm [47J, also used in Ref. Ij46| . The b-tagging algorithm combines at least two 
tracks into at least one secondary vertex, and forms a discriminating variable based on the 
three-dimensional decay length of the vertex. 

The subjet jet-energy scale factor is estimated by extracting a W mass peak in the muon control 
sample, and comparing the peaks in data and MC. The mass distribution of the jet of largest 



mass in the hadronic hemisphere is shown in Fig. 2a 



In this figure, the MC tf contribution is normalized to the approximate next-to-next-to-leading- 
order (NNLO) cross section for inclusive tt production of 163 pb [48 -50]. The non-W mulitjet 
component is based on sidebands in data that have the muon isolation criterion reversed. The 
contributing spectrum is normalized through a fit to the missing transverse energy. The strin- 
gent criteria of this analysis provide very few W+jets events that pass the required selections. 
This distribution is therefore taken to be the same as that of the generic non-W multijet back- 
ground. This is acceptable because the mass structure within the candidate top-quark jets are 
very similar in these two samples, and the sideband data has many more events that pass the 
selection criteria. The W+jets contribution is normalized to the inclusive W production cross 
section of cr w _^ +1/ = 31.3 ± 1.6 nb computed at NNLO with FEWZ |[5l"1| . A fit of the sum of two 
Gaussian functional forms to data is given by the solid line, and a similar fit to the simulated 
events is shown as a dashed line. The centers of the main Gaussian distributions in data and 
MC are m° ATA = 83.0 ± 0.7GeV/c 2 and m™ c = 82.5 ± 0.3 GeV/c 2 , respectively. The subjet jet- 
energy scale factor for W jets is determined by taking the ratio of these two values, and equals 
1.01 ± 0.01, including statistical uncertainty only. 

The subjet selection-efficiency scale factor is estimated by comparing the observed selection 
efficiency in the muon control sample in data and MC. The ratio of the number of events in 



the W mass window (60 < m- ]ei < 100 GeV/c ) in Fig. 2a after W tagging, to the number of 



events in the muon control sample, defines the W selection efficiency within the mass window. 
For data and MC the values are e° ATA = 0.49 ± 0.01 and e**J = 0.50 ± 0.01, respectively. 
Similarly, the mass-drop selection is checked in data and Monte Carlo, following the W-mass 
window selection, and a similar efficiency is extracted, with the observed values being ej^ ATA = 
0.64 ± 0.01 and = 0.64 ± 0.01. Combining efficiencies of the mass-drop and mass selections, 
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the subjet selection-efficiency scale factor, to be applied to the MC to obtain the same efficiency 
as in data, is determined to be 0.97 ± 0.03. The same scale factor and uncertainty are assumed 
for Type-1 jets, which is consistent with results from the statistics-limited control sample of 
muon events that contain Type-1 jets. As two top-quark tags are required in each event, the 
correction for both 1+1 and 1+2 events is the square of the single-tag scale factor, yielding 
0.94 ±0.06. 

The Type-1 jet selection cannot be checked at the same level of precision as the W-jet selection 
because of the small number of Type-1 jets in the muon control sample. However, the Type-2 



top-quark candidate selection can be tested in the muon control sample, as shown in Fig. 2b 
The same procedure is used to construct these Type-2 top-quark candidates as in the 1+2 selec- 
tion, including the W-mass and mass-drop selections. Within the statistical uncertainty, good 
agreement is observed in the data and simulation. Since the selection of Type-2 top-quark can- 
didates considers a boosted three-body decay as well as a W tag, the agreement between the 
characteristics of candidates in data and in MC provides further confidence for the assumption 
that the scale factor for the efficiency of W-tagging is appropriate for three-body decays such 
as the Type-1 top-quark system. 

To check the dependability of our assumption, namely that the data-to-MC scale factors are 
the same for the muon control sample as for the 2! signal, the scale factor is measured in a 
control sample using more stringent kinematic requirements to select a part of phase space 
similar to that of a 2! signal with m ~ 1 TeV/c 2 . The b-tagging requirement is dropped in order 
to collect more tt events. Also, to capture the kinematics of the background, instead of using 
the distribution for W+jets from the sidebands in data without isolated muon candidates, as is 
done in the fit to the W mass, the distribution for W+jets is taken instead from the W+jets MC. 

In all samples with large pj thresholds, the data-to-MC scale factor is found to be consistent 
with the measured value of 0.97 ± 0.03. The selection that provides sideband regions most 
similar in kinematics to that of the signal region is a requirement that the Type-2 top candidate 



satisfy pi > 400 GeV/ c. Fig. 3a shows the pi distribution for the Type-2 candidates in the muon 
control sample, and Fig. 3b shows the pj of the W-jet within the Type-2 top-quark candidate, 
as defined by the jet of largest mass in the event. Arbitrarily normalized distributions for a Z' 
signal with m = 1 TeV/c 2 are overlaid for comparison. For completeness, Figs. 4a and 4b show 



plots identical to Figs. 2a and 2b but with selections that require Type-2 top-quark candidates 
with pj > 400 GeV/c. The scale factor extracted from this higher-px subsample is 0.99 ± 0.11, 
which is consistent with the quoted data-to-MC scale factor of 0.97 ± 0.03. 

3.3 Background estimate 

Since this analysis focuses on signatures with high-px jets, the main backgrounds expected are 
from SM non-top multijet production and tt production. The background from NTMJ produc- 
tion is estimated from sidebands in the data as described below. For the Z' masses considered 
in this analysis, the irreducible SM tt component is significantly smaller than the NTMJ back- 
ground contribution, and is therefore estimated from MC simulation using the same correction 



factors as found for the Z' MC described in Sec. 3.2 It is normalized to the approximate NNLO 
cross section described in Sect 



In both 1+1 and 1+2 channels, estimates of the dominant NTMJ background are obtained from 
data as follows. First, the probability is estimated for mistaking a non-top jet as a top-quark jet 
through the top-tagging algorithm. This procedure defines the mistag probability (P m )- Higher 
momentum jets have a larger probability to radiate gluons, and as the jet px increases, they 
are more likely to have top-like substructure and thereby satisfy a top tag [52 1. The mistag 
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Figure 2: (a) The mass of the highest-mass jet (W-jet), and (b) the mass of the Type-2 top can- 
didate (W + b), in the hadronic hemisphere of moderately-boosted events in the muon control 
sample. The data are shown as points with error bars, the tt Monte Carlo events in dark red, 
the W+jets Monte Carlo events in lighter green, and non-W multijet (non-W MJ) backgrounds 
are shown in light yellow (see Ref. [46j for details of non-W MJ distribution derivation). The 
jet mass is fitted to a sum of two Gaussians in both data (solid line) and MC (dashed line), the 
latter of which lies directly behind the solid line for most of the region. 
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Figure 3: (a) pj of the Type-2 top-quark candidate in the muon control sample. The color 
scheme is the same as in Figs. 2a and [2b] (b) pj of the W-candidate from within the Type-2 top- 
quark candidate, after a selection on the jet mass of the highest-mass jet in the muon control 
sample. Overlaid on both (a) and (b) are the corresponding distributions from a Z' MC signal 
with m = 1 TeV/c 2 (with arbitrary normalization for visualization) to compare kinematics in 
the muon control region to the signal region. 
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Figure 4: (a) The mass of the highest-mass jet (W-jet), and (b) the mass of the Type-2 top can- 
didate (W + b), in the hadronic hemisphere of moderately-boosted events in the muon control 
sample. The Figure corresponds to Figs.[2a]and[2bj except there is an additional requirement on 
the Type-2 top candidate pj to be similar to the signal region. Figure 3a shows the distribution 
of the Type-2 top candidate pj. 
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probability is therefore obtained as a function of total jet pj, using the following procedure. 
Events with > 3 jets are selected for the 1+2 topology, with the three leading jets in the event 
required to pass pj thresholds of 350, 200, and 30 GeV/ c, respectively, without any requirements 
placed on the jet mass of the Type-1 candidate. The mass of the W-boson candidate within the 
Type-2 candidate is required to fall within the W-boson mass window, and the invariant mass 
of the Type-2 candidate is required to fall within the top-quark mass window. However, the 
mass-drop requirement is inverted (}i > 0.4) to define a signal-depleted sideband. The small 
contribution from the SM tt continuum is subtracted using MC expectation, and the mistag 
probability Pmipj) is defined by the fraction of Type-1 candidates that are top-tagged, as a 



function of their pj. The resulting mistag probability appears in Fig. lb 



Next, the 1+2 and 1+1 samples are defined using a loose selection: (i) in trijet events, two jets in 
one hemisphere are required to pass the Type-2 selection and (ii) in dijet events, one randomly- 
chosen jet is required to pass the Type-1 selection. In both cases, the other high-pj jet in the 
event (the probe jet) is not required to be top-tagged. These samples are dominated by NTMJ 
events. For each event in the loose selection (z), the probability that the event would pass the 
full selection in the signal region (.P NTM j) e 1 ua ls the mistag probability (P m {pj)), evaluated at 
the pj of the probe jet in event i (p' T ), 



Pkruj = Pm(f4). (!) 
The total number of NTMJ events (Nntmj) is then equal to the sum of the weights from Eq. [I] 



Nntmj = £ P NTMJ = £ PmM, (2) 

where Ni oose is the number of events passing the loose selection and the other quantities are 
defined above. 

The ensemble of jets in the loose pretagged region have, on average, a lower jet mass than 
the jets in the signal region. Consequently, the tn t i spectrum in this sideband is kinematically 
biased. To emulate the event kinematics of the signal region, the jet mass of the probe jet is 
ignored, and instead it is set to a value randomly drawn from the distribution of jet masses of 
probe jets from NTMJ MC events in the range 140 to 250 GeV/c 2 . 

This procedure is cross-checked on a NTMJ MC sample to ensure that the methodology achieves 
closure. In this cross-check, half of the events in the MC are used to derive a mistag probabil- 
ity using the above procedure, and then used to predict the expected number of tags for the 
remaining events, which is compared to the observed number of tags in these events. The 
observed and expected number of tags agree within statistical uncertainties. 

Possible biases in the calibration procedure from the presence of a new Z' have also been inves- 
tigated in the analysis. For instance, a Z' signal with m Z / = 3 TeV/ c 2 and a width of 30 GeV/ c 2 
contributes less than 1% to the events defined through the loose selection criteria as well as to 
the sideband regions used to determine the probability of mistagging jets. 

The uncertainty on this procedure is taken as half the difference between the m# distributions 
obtained using the modified and unmodified probe-jet masses. Two choices of alternative prior 
distributions for the probe-jet mass were investigated, the MC-based prior described above, 
and a flat prior. The systematic uncertainty estimated with the current method is slightly more 
conservative. 
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3 Analysis method 



Table 1: The number of events observed in the loose selection (Ni oose ), which is used as input 
to compute the number of events predicted in the signal region for the mistagged NTMJ back- 
ground (Nntmj)- Both appear in Eq.[2]and are described in detail in Sec. 3.3 Figure lb shows 
the value P m (pj) used in Eq.[2] For Nntmj/ the first and second uncertainties are statistical and 
systematic, respectively. 





m ft =0.9-1.1 TeV/c 2 
1+1 1+2 


m tI =1.3-2.4 TeV/c 2 
1+1 1+2 


^loose 

Nntmj 


22015 70545 
443 ± 4 ± 22 1239 ± 6 ± 31 


18401 30253 
741 ± 6 ± 30 817 ± 6 ± 36 



Table 2: Expected and observed number of events in two different tt mass windows for the 1+1 
and 1+2 samples. The expected SM tt is taken from MC predictions, and the expected NTMJ 
background is derived in Table IT] 





m tl =0.9-1.1 TeV/c 2 
1+1 1+2 


m ff =1.3-2.4 TeV/c 2 
1+1 1+2 


Expected SM tt events 
Expected non-top multijet events 
Total expected events 
Observed events 


69 ± 36 110 ± 62 
443 ± 23 1239 ± 32 
512 ± 43 1349 ± 70 
506 1383 


65 ±42 24 ± 15 
741 ± 32 817 ± 38 
806 ± 53 841 ± 41 
809 841 



Table [I] provides an estimate for the mistagged NTMJ background for two tt mass windows: 
0.9-1.1 TeV/c 2 and 1.3-2.4 TeV/c 2 . The first row corresponds to the number of events observed 
in the loose selection (Ni oose from Eq. [2j to which the mistag probability is applied. The second 
row corresponds to the number of expected events from the mistagged NTMJ background in 
the signal region (Nntmj in Eq.|2|l. As can be observed in Table]!] the primary uncertainty on the 
NTMJ background is from the systematic uncertainty assigned to the procedure for modifying 
probe-jet masses. 

3.4 Results of event selection 

Observed distributions for 1+1 and 1+2 events in data are compared to the expected back- 
grounds in Fig. [5j The NTMJ background expectation determined from data is given by the 
yellow (light) filled histograms. The SM tt estimate is shown as red (dark) filled histograms, 
and the data are shown as solid black points. The hatched gray regions indicate the total un- 
certainty on the backgrounds. Figure [5] also shows for comparison the signal expectation from 
MC for several hypothetical Z' signals with masses from 1 to 3TeV/c 2 with a width of 1%, in 
the 1+1 and 1+2 samples, but with cross sections taken from the expected limits discussed in 
Sec. ED 

From Fig. |5j it is clear that the dominant background in this analysis is from NTMJ events 
rather than from SM tt production. The implementation of b-quark selections has not as yet 
been introduced to improve the sensitivity of this search, and must await an improvement in 
performance of b-tagging in merged top jets. 

To demonstrate the components of the background estimate, Table |2]lists the number of events 
expected from background sources in the 1+1 and 1+2 channels, along with the observed num- 
ber of events, for two tf mass windows: from 0.9-1.1 TeV/c 2 , and 1.3-2.4 TeV/c 2 . The systematic 
uncertainties on these values are now summarized in Sec. |U 



3.4 Results of event selection 
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CMS, L = 5 ftp, is = 7 TeV 



Type 1+1 

Observed 
Non-top multijet 

tt simulation 
Stat, ffi sys. bkgd uncert. 
Z'(1 TeV/c 2 )a = 1.0 pb 
Z'(1.5TeV/c 2 )o = 0.18pb 
Z'(2 TeV/c 2 ) a = 0.06 pb 
Z'(3 TeV/c 2 ) a = 0.03 pb 




500 



= 



d-0.5 



1000 1500 2000 2500 3000 3500 4000 4500 5000 

„2\ 



0.5 



tt mass (GeV/cf 



(a) 



CMS, L = 5 ftp", \!s = 7 TeV Type 1 +2 




500 1000 1500 2000 2500 3000 3500 4000 4500 5000 

tt mass (GeV/c 2 ) 

(b) 

Figure 5: Results for (a) 1+1 and (b) 1+2 event selections and background estimates. The yellow 
(light) histograms are the non-top multijet (NTMJ) estimates from data, as described in the 
text, and the red (dark) histograms are the MC estimates from SM tt production. The black 
points are the data. The hatched gray boxes combine the statistical and systematic uncertainties 
on the total background. For comparison, expectations for some Z' hypotheses are shown 
for the assumption of 1% resonance width, with cross sections taken from the expected limits 
discussed in Sec. 15.11 Also shown are the ratio of the fractional difference between the data and 
the prediction, shown in red circles, with the y-axis on the right of the plot, and the number of 
standard deviations (N a ) of the observation from the prediction, shown as a black histogram 
with the y-axis on the left of the plot, where the binning is adjusted to have at least 20 events in 
each bin. 
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5 Statistical treatment 



4 Systematic uncertainties 

The sources of systematic uncertainty on the tt invariant mass spectrum fall into three cate- 
gories: (i) the determination of the efficiency (ii) the mistag probability and (iii) the shape of 
the tt invariant-mass distribution. Several sources of systematic uncertainties can simultane- 
ously affect these three categories, and in such cases, any changes in parameters have to be 
varied in a correlated way. The uncertainties on efficiency include uncertainties in the overall 
jet-energy scale for tagged jets (~ 2-A% from standard jet energy corrections, ~ 3% from the 
application of the jet corrections to pruned jets, and ~ 1% to account for the uncertainty on 
the determination of the W mass in data and MC, as described in Sec. |3.2| , integrated lumi- 
nosity (2.2%), subjet-selection efficiency (~ 6%, as described in Sec. |3.2| , jet energy resolution 
(< 1%), and jet angular resolution (< 1%). The trigger uncertainty for 1+1 events is 13% for 
m t j = ITeV/c 2 , and < 1% for tt masses above 1.5TeV/c 2 . The trigger uncertainty is larger for 
1+2 events: 20% for = ITeV/c 2 , and 3% for > 1.5TeV/c 2 , as described in Sec. The 
impact of changes in parton distribution functions ||53| is found to be negligible. 

Similar uncertainties affect the tt continuum background and are estimated in the same manner. 
In addition, the large uncertainty on the renormalization and factorization scales (a factor of 
two) is found to have significant impact on SM tt production, resulting in a 50% variation in 
the yield, estimated from MC studies. This is reflected in the uncertainties on the number of tt 
events in Table [2] Table |3]provides a summary of this information. 

Table 3: Summary of relative systematic uncertainties on signal efficiency for two tt mass win- 
dows. All values are in percent. The central value of the subjet selection scale factor is 0.94, it 
is the only scale factor that has a non-unit mean. 



Source 


Variation 


m ti 


=0.9-1.1 TeV/c 2 


m tl 


=1.3-2.4 TeV/c 2 






1+1 


1+2 


1+1 


1+2 


MC Statistical 




2.0 


1.6 


0.7 


1.6 


Trigger 


See text 


13 


20 


<1 


3 


Jet energy scale 


« ±5 


19 


19 


2 


2 


Subjet efficiency scale factor 


±6 


6 


6 


6 


6 


Luminosity 


± 2.2 


2.2 


2.2 


2.2 


2.2 


Total 




24 


28 


7 8 



The uncertainties on the mistagged NTMJ background include the statistical uncertainty on 
the sample after loose selection, to which the mistag probability is applied; the statistical un- 
certainty on the mistag probability itself, ranging from < 1% at 1 TeV/c 2 to ~ 10% at 3TeV/c 2 
as seen in Fig. lb and the systematic uncertainty on the mistag probability application, as 
described in Sec. 3.3 which is in the range of 1 to 5% depending on the tt mass. The total back- 
ground uncertainty is ~ 5% for the low-mass region, dominated by the systematic uncertainty, 
and « 100% for the high-mass region, dominated by statistical uncertainty associated with the 
sample after loose selection. 



5 Statistical treatment 

The main result of this analysis is the fit to data assuming a resonance hypothesis for the new 
physics, in which a likelihood is fit to the expected tt invariant mass distributions for signal and 
background. The second result corresponds to a counting of events relative to some generic 
model of an enhancement of the tt continuum assuming the SM efficiency for the additional 
contribution. These two results are discussed below. 



5.1 Resonance analysis 
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5.1 Resonance analysis 

The first analysis uses a resonant signal hypothesis to search for localized contributions to the 
m t i spectrum. A cross-check of the analysis is performed by counting the number of events in a 
mass range defined for each resonant mass value. Two such mass ranges are shown in Tables [l]- 
[3] In the main resonance analysis and the cross-check, the number of observed events, N {, s , is 
compared to the expectation N eX p, based on the production cross section c Z /, branching frac- 
tion B(Zl — > tt), signal reconstruction efficiency e, integrated luminosity L, and the predicted 
number of background events Nb- 



N exp = cr z , x B(Z' -4 tt) x e x L + N B . (3) 

The likelihood is computed using the Poisson probability to observe N g }, s , given a mean of 
N eX p, with uncertain parameters L, e, and Ng, all defined through log-normal priors based 
on their mean values and their uncertainties. The shapes and normalizations of signal and 
background distributions are varied within their systematic uncertainties until the likelihood is 
maximized. This procedure effectively integrates over the parameters describing the systematic 
uncertainties, thereby reducing their impact. 

The upper limits at 95% confidence level (CL) on the product of the Z' production cross section 
and the branching fraction to the tt final state are extracted for the combination of the 1+1 and 
1+2 tt mass spectra, as a function of rtiji in a range from 1 to 3 TeV/c 2 with a 0.1 TeV/c 2 increment. 
A CLs method |54l456l is used to extract the 95% CL upper limits, with the posterior based on 
a Poisson model for each bin of the distribution. 



Figure |6]shows the observed and expected upper limits for: (a) a Z' hypothesis with Tz'/ntz 1 = 
1%, (b) a Z' hypothesis with Tz' f m z' — 10%, and (c) a Randall-Sundrum Kaluza-Klein gluon 
hypothesis. Also shown are the theoretical predictions for several models to compare to the 



observed and expected limits. In Figure 6a predictions are also shown for a topcolor Z' model 
based on Refs. QHg) updated to y/s = 7TeV in Ref. E3, with T z /m z > = 1.2% and r z //m z / = 
3%, compared to limits obtained assuming a 1% width. Higher-order QCD corrections to the 
Z' production cross section were accounted for through a constant K-factor, computed to be 
1.3. The same Z' model, but for Tz> /mz' = 10%, is compared to the limits obtained assuming a 



10% width in Fig. 6b Finally, in Fig. pq the prediction of the Randall-Sundrum Kaluza-Klein 



gluon model from Ref. [12] is compared to the limits from data. 

Using the upper limits for the Z' with 1% width, mass ranges for two Z' models are excluded 



as seen in Fig. 6a First, the mass range 1.0-1.6 TeV/c 2 is excluded for a topcolor Z' with width 
r Z //m Z ' = 3%. Second, two mass ranges, 1.3-1.5 TeV/c 2 and a narrow range (smaller than the 
mass increment) close to 1.0 TeV/c 2 , are excluded for the same topcolor Z' with width T Z ' I tnz' = 



1.2%. Similarly, using the upper limits for the Z' with 10% width as seen in Fig. 6b the mass 
range 1.0-2.0 TeV is excluded for a topcolor Z' with width r z > I tnz 1 = 10%. 



Finally, as seen in Fig. 6c upper limits in the range of lpb are set on cy x B(g' — > tt) for 

V 



trig' > 1.4 TeV/c 2 for a specific Randall-Sundrum gluon model [12J, which exclude the existence 



of this particle with masses between 1.4-1.5 TeV/c 2 , as well as in a narrow region (smaller than 
the mass increment) close to 1.0TeV/c 2 . 

The resonant analysis is cross-checked by counting events in specified mass windows of m t - t . 



The signal region is defined by a window in m^, and the background estimates from Figs. 5a 



and 5b are integrated over this range. The results obtained from this cross-check are consistent 



with the analysis of the tn^ spectrum, but are not as sensitive. Table [2] shows the number 
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5 Statistical treatment 
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= Observed (95% CL) = 
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Figure 6: The 95% CL upper limits on the product of production cross section (c) and branch- 
ing fraction (B) of hypothesized objects into tt, as a function of assumed resonance mass, (a) 
71 production with T z > fm Z i = 1% (1% width assumption) compared to predictions based on 
Refs. HU for T z ,/m z , =1.2% and 3.0%. (b) Z' production with T z >/m z , = 10% (10% width 
assumption) compared to predictions based on Refs. HHEl for a width of 10%. (c) Randall- 
Sundrum Kaluza-Klein gluon production from Ref. [12], compared to the theoretical predic- 
tion of that model. The ±1 and ±2 standard deviation (s.d.) excursions are shown relative to 
the results expected for the available luminosity. 



5.2 tt enhancement analysis 



17 



Table 4: Expected number of events with > ITeV/c 2 from SM tt and non-top multijet 
backgrounds, along with their total, compared to the observed number of events. The efficiency 
for SM tt production, which is used in the limit setting procedure described in the text, is shown 
on the final line. 





1+1 


1+2 


Expected SM tt events 
Expected non-top multijet events 
Total expected events 
Observed events 


194 ± 106 
1546 ± 45 
1740 ± 115 
1738 


129 ± 80 
2271 ± 130 
2400 ± 153 
2423 


tf efficiency 


(2.5 ±1.3) x 10~ 4 


(1.6 ±1.0) x 10~ 4 



of events observed and expected in two mass windows for the 1+1 and 1+2 channels, 0.9- 
l.lTeV/c 2 corresponding to the ITeV/c 2 Z' sample, and 1.3-2.4 TeV/c 2 corresponding to the 
2TeV/c 2 Z' sample. The observed 95% CL upper limits on signal cross section change from 1.0 
to 2.0 pb at 1 TeV/c 2 , from 0.10 to 0.26 pb at 2 TeV/c 2 , and from 0.02 to 0.05 pb at 3 TeV/c 2 . Most 
of the difference is attributed to a better statistical handling in the resonance analysis of the 
bins with large background in the mass distribution. 

5.2 tt enhancement analysis 

In the second analysis, general enhancement is assumed in modeling the tf mass spectrum due 
to some new phenomenon (NP), assuming the same signal efficiency as for the SM tf contin- 
uum, as described in Refs. |22U23I . The limit on any possible enhancement is presented in terms 
of a variable S, the ratio of the integral of the m# distribution above 1 TeV/ c 2 corresponding to 
SM tf production and a contribution from some NP, to that from just SM tf production: 

r dasM+NP A m _ 

s _ Jm^MTeV/c 2 dmg am tt 

~ f dasM dm - 

J m l{ > 1 TeV/c 2 dm tf u rn tt 

The events used for setting the limit are selected to have reconstructed m# > 1 TeV/ c 2 , which 
does not correspond to the same range for the true mass. Consequently, a correction factor 
must be applied to the reconstructed tf mass distribution to estimate the true mass distribution. 
This is estimated by dividing the number of simulated tf events with a reconstructed mass 
> 1 TeV/c 2 by the number of simulated tf events with a true mass > 1 TeV/c 2 . This ratio is 1.24 
± 0.08 for the Type 1+1 analysis and 1.41 ± 0.11 for the Type 1+2 analysis. These differences are 
applied as multiplicative factors to obtain the yields for the true tf mass above 1 TeV/c 2 . These 
factors do not affect the quantity S since they cancel in the ratio. 

The approximate NNLO cross section for inclusive tf production is taken to be 163 pb fl48T - 
|50| . The efficiency for Type 1+1 events, relative to inclusive SM tf production, is found to 
be (2.5 ± 1.3) x 10~ 4 , and for Type 1+2, the efficiency is (1.6 ± 1.0) x 10~ 4 . The numbers of 
observed and expected events for the SM tf and NTMJ backgrounds are shown in Table |lj 
along with these efficiencies. Following the statistical procedure outlined above, it follows that 
the enhancement factor to the tf production cross section for > 1 TeV/c 2 (S in Eq. (jijl) must 
be < 2.6. The a priori expectation is for this limit to lie in the interval 2.0-3.5 at 68% CL, and 
1.7-5.5 at 95% CL, with a most probable value of 2.5. 
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6 Summary 



6 Summary 

In summary, a search is presented for a massive resonance (Z') decaying into a tt pair in the 
all-hadronic final state using an integrated luminosity of 5 fb collected with the CMS detec- 
tor at 7TeV. A Z' with standard-model couplings is considered as a model of such a resonance. 
Two widths are considered (T z i/m z i = 1% and 10%), as well as an additional model of a 
Randall-Sundrum Kaluza-Klein gluon. The search focuses on high tt masses that yield colli- 
mated decay products, partially or fully merged into single jets. The analysis therefore relies 
on new developments in the area of jet substructure, thereby providing suppression of non-top 
multijet production. 

No excess of events is observed over the expected yield from SM background sources. Upper 
limits in the range of 1 pb are set on the product of the Z' cross section and branching fraction 
for a topcolor Z' modeled for several widths, as well as for a Randall-Sundrum Kaluza-Klein 
gluon. 

Finally, results are presented for any generic source of new phenomena with the same recon- 
struction efficiency as standard-model tt production, and limits are placed on any enhancement 
to the cross section from such a contribution. In particular, the tt production cross section (in 
total) must be less than a factor of 2.6 times that of the SM expectation for > 1 TeV/c 2 . This 
constrains generic enhancements to standard-model tt production, which can be used to check 
models that seek to interpret the forward-backward tt production asymmetry observed at the 
Tevatron as a sign of new physics. 

This is the first publication to constrain tf resonances in the kinematic region of > 1 TeV/c 2 , 
and is also the first work to use the jet-substructure tools described above. 
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